install.packages("datapasta")
library(datapasta)
“Datapasta” for easy copy and paste in R
Introduction
Struggling to scrape a table from the web and format it correctly? Don’t worry—datapasta is here to simplify the process! This handy library lets you effortlessly bring data into R without extensive coding. While it’s best suited for small tables rather than large datasets, it’s a huge time-saver for quick tasks. Let’s understand how to use it step by step.
For this demonstration, we’ll fetch a table from the following site: click here
Let us begin with installing and loading the required package
Once, “datapasta” is installed restart R for it to reflect in Addins (as shown below)
Step 1: Select and copy table
For demonstration purpose we are going to fetch a table from site: click here
Note: The table can be fetched from website, word, spreadsheets, CSV files, or structured text. However, it does not have built-in capabilities to work directly with PDF files (Incase, you need to extract tables from PDF refer to our previous article: Importing and extracting tables from PDF into R using “pdftools” )
Step 2: Paste the data in R
Go to Addins > Select the option of your choice e.g: “Paste as tribble” or “Paste as data.frame”
# Data pasta directly refers to "tibble" library as shown below, incase it doesn't you will need to load it explicitly
<- tibble::tribble(
Table_as_tribble ~Company, ~Contact, ~Country,
"Alfreds Futterkiste", "Maria Anders", "Germany",
"Centro comercial Moctezuma", "Francisco Chang", "Mexico",
"Ernst Handel", "Roland Mendel", "Austria",
"Island Trading", "Helen Bennett", "UK",
"Laughing Bacchus Winecellars", "Yoshi Tannamuri", "Canada",
"Magazzini Alimentari Riuniti", "Giovanni Rovelli", "Italy"
)
print(Table_as_tribble)
# A tibble: 6 × 3
Company Contact Country
<chr> <chr> <chr>
1 Alfreds Futterkiste Maria Anders Germany
2 Centro comercial Moctezuma Francisco Chang Mexico
3 Ernst Handel Roland Mendel Austria
4 Island Trading Helen Bennett UK
5 Laughing Bacchus Winecellars Yoshi Tannamuri Canada
6 Magazzini Alimentari Riuniti Giovanni Rovelli Italy
# Paste as dataframe
<- data.frame(
Table_as_df stringsAsFactors = FALSE,
Company = c("Alfreds Futterkiste",
"Centro comercial Moctezuma","Ernst Handel","Island Trading",
"Laughing Bacchus Winecellars",
"Magazzini Alimentari Riuniti"),
Contact = c("Maria Anders","Francisco Chang",
"Roland Mendel","Helen Bennett",
"Yoshi Tannamuri","Giovanni Rovelli"),
Country = c("Germany","Mexico","Austria","UK",
"Canada","Italy")
)print(Table_as_df)
Company Contact Country
1 Alfreds Futterkiste Maria Anders Germany
2 Centro comercial Moctezuma Francisco Chang Mexico
3 Ernst Handel Roland Mendel Austria
4 Island Trading Helen Bennett UK
5 Laughing Bacchus Winecellars Yoshi Tannamuri Canada
6 Magazzini Alimentari Riuniti Giovanni Rovelli Italy
Beyond Pasting Data: Aligning and Formatting Made Easy
For instance, imagine you’ve written a long vector but forgot to add quotes around the elements. Manually adding the quotes can be time-consuming. Consider the example:
c(Germany, Mexico, Austria, UK, Canada, Italy)
This will result in the error: “Error: object ‘Germany’ not found”
With datapasta
, you can quickly fix such issues, saving both time and effort.
# Select the vector > Addins > In Datapasta select "Toggle Vector Quotes"
# Voila your vector is now formatted correctly
c("Germany","Mexico","Austria","UK","Canada","Italy")
[1] "Germany" "Mexico" "Austria" "UK" "Canada" "Italy"
# Similary using "Fiddle Selection" will align your code
c("Germany",
"Mexico",
"Austria",
"UK",
"Canada",
"Italy")
[1] "Germany" "Mexico" "Austria" "UK" "Canada" "Italy"
Step 3: Set shortcuts in Addins
- Go to Tools > Keyboard Shorcuts… > In shortcut column add necessary shortcut > Apply
Conclusion:
When it comes to copying and pasting small datasets or converting data into R-friendly formats, the datapasta package is a priceless tool for streamlining data operations in R.
Benefits:
Time-Saving
Ease of Use
Cons:
Limited to Small Datasets
Niche Use Case: Its primary focus is on quick data entry means it is less relevant for those who work with larger datasets or established pipelines.